1. Load Packages

source("./Mean Reversion/RMR.001 Load Packages.R") 

2. Load Data

pricing_data <- read_csv("./Mean Reversion/Raw Data/pricing data.csv") 
## Parsed with column specification:
## cols(
##   date_unix = col_integer(),
##   date_time = col_datetime(format = ""),
##   high = col_double(),
##   low = col_double(),
##   open = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   quote_volume = col_double(),
##   weighted_average = col_double(),
##   currency_pair = col_character(),
##   period = col_integer()
## )

3. Prepare Data Function

Description
Spreads Poloneix pricing data into wide format and filters data to a specified time resolution and time window.

Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
start_date: The start date of the time window.
end_date: The end date of the time window.

prepare_data <- function(pricing_data, time_resolution, start_date, end_date) { 
  df <- pricing_data %>% 
    filter(period == time_resolution, 
           date_time >= start_date, 
           date_time <= end_date) %>% 
    select(date_unix, date_time, close, currency_pair) %>% 
    spread(currency_pair, close) 
  return(df)
} 

4. Test Cointegration Function

Description
The Engle-Granger method is used to test for cointegration. This method is comprised of two steps: (1) Perform a linear regression of log(coin_y) on log(coin_x). (2) Perform an Augmented Dickey-Fuller test on the residuals from the linear regression estimated in (1). The ADF test specification is of a non-zero mean, no time-based trend, and one autoregressive lag. The function returns the ADF test statistic.

Arguments
coin_y: A vector containing the pricing data for the dependent coin in the regression.
coin_x: A vector containing the pricing data for the independent coin in the regression.

test_cointegration <- function(coin_y, coin_x) { 
  lm_model <- lm(log(coin_y) ~ log(coin_x))  
  lm_residuals <- lm_model[["residuals"]] 
  adf_test <- ur.df(lm_residuals, type = "drift", lags = 1) 
  df_stat = adf_test@testreg[["coefficients"]][2, 3]
  return(df_stat) 
} 

5. Create Coin Pairs Function

Description
Two sets of currency pairs are examined: currency pairs where USDT is the quote currency and currency pairs where BTC is the quote currency. All combinations of coins are created within a given quote currency. Combinations that consist of the coin with itself are removed. The function returns a dataframe containing the coin pairs.

Arguments
quote_currency: A string indicating the quote currency of the currency pairs. Can take values USDT or BTC.

create_pairs <- function(quote_currency) { 
  if (quote_currency == "USDT") { 
    coin_list <- c("USDT_BTC", "USDT_DASH", "USDT_ETH", "USDT_LTC", "USDT_REP", "USDT_XMR", "USDT_ZEC")
  } 
  if (quote_currency == "BTC") { 
    coin_list <- c("BTC_DASH", "BTC_ETH", "BTC_LTC", "BTC_REP", "BTC_XEM", "BTC_XMR", "BTC_ZEC")
  } 
  coin_pairs <- expand.grid(coin_list, coin_list) %>% 
    rename(coin_y = Var1, 
           coin_x = Var2) %>% 
    filter(coin_y != coin_x) %>% 
    mutate_if(is.factor, as.character) %>%
    as_tibble() 
  return(coin_pairs)
} 

6. Test Coin Pairs Function

Description
Test for cointegration between each coin pair generated by the create_pairs() function. The test for cointegration is performed by the test_cointegration() function. The function returns a dataframe containing the coin pairs and the ADF test statistic resulting from testing cointegration between each coin pair.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pairs.
coin_pairs: A dataframe generated by create_pairs().

test_pairs <- function(train, coin_pairs) { 
  adf_stat <- c() 
  for (n in 1:nrow(coin_pairs)) { 
    coin_y <- coin_pairs[[n, "coin_y"]] 
    coin_x <- coin_pairs[[n, "coin_x"]] 
    cointegration_results <- test_cointegration(coin_y = train[[coin_y]], coin_x = train[[coin_x]])
    adf_stat <- c(adf_stat, cointegration_results)
  } 
  df <- coin_pairs %>% 
    mutate(adf_stat = adf_stat) %>% 
    arrange(adf_stat)
  return(df) 
} 

7. Select Coin Pairs Function

Description
Select cointegrated coin pairs to be used in a mean reversion strategy. The current coin selection logic is to select all coins where the ADF test statistic is less than -2.57.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().

select_pairs <- function(train, coin_pairs) { 
  set.seed(5) 
  df <- test_pairs(train = train, coin_pairs = coin_pairs) %>% 
    filter(adf_stat <= -3.43)
  return(df) 
} 

8. Generate Signals Function

Description
Generate trading signals that indicate the current position in the spread formed by a linear combination of coin y and coin x. A signal of +1 indicates a long position in the spread, 0 indicates a flat position, and -1 indicates a short position in the spread. Signals are generated for the test set using a model trained on the training set.

The current trading logic is perform a linear regression of log(coin y) on log(coin x) using the training set. A spread is then calculated in the test set using the fitted hedge ratio and intercept from the regression. The z-score of the spread is then calculated using the mean and standard deviation from the training set. A position is entered when the z-score reaches +2 or -2 and is exited when the z-score reaches 0. Also exits losing positions when the z-score reaches +4 or -4 and re-enters the position when when it returns to within the +4 or -4 range.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

generate_signals <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))    
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_signals <- test %>% 
    mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept, 
           spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]), 
           lag_spread_z = lag(spread_z, 1), 
           signal_long = ifelse(lag_spread_z <=  0 & lag_spread_z > -1, 0.25, 0), 
           signal_long = ifelse(lag_spread_z <= -1 & lag_spread_z > -2, 0.50, signal_long), 
           signal_long = ifelse(lag_spread_z <= -2 & lag_spread_z > -3, 0.75, signal_long), 
           signal_long = ifelse(lag_spread_z <= -3 & lag_spread_z > -4, 1.00, signal_long), 
           signal_long = ifelse(lag_spread_z <= -4, 0, signal_long), 
           signal_short = ifelse(lag_spread_z >= 0 & lag_spread_z < 1, -0.25, 0), 
           signal_short = ifelse(lag_spread_z >= 1 & lag_spread_z < 2, -0.50, signal_short), 
           signal_short = ifelse(lag_spread_z >= 2 & lag_spread_z < 3, -0.75, signal_short), 
           signal_short = ifelse(lag_spread_z >= 3 & lag_spread_z < 4, -1.00, signal_short), 
           signal_short = ifelse(lag_spread_z >= 4, 0, signal_short), 
           signal = signal_long + signal_short, 
           signal = ifelse(is.na(signal), 0, signal)) 
  return(df_signals[["signal"]])
} 

9. Backtest Pair Function

Description
Calculate the return of a cointegration-based mean reversion trading strategy using coin y and coin x.

The current backtesting logic uses signals generated by generate_signals(). The coin_y_return and coin_x_return indicate the one period percentage return of each coin. The coin_y_position and coin_x_position indicate the market value in USD in each coin. coin_y_pnl and coin_x_pnl indicate the USD value of the profit and loss for each coin. The combined_position indicates the gross market value of the combined positions.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

backtest_pair <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))   
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_backtest <- test %>% 
    mutate(signal = generate_signals(train = train, 
                                     test = test, 
                                     coin_y = coin_y, 
                                     coin_x = coin_x, 
                                     threshold_z = threshold_z), 
           coin_y_return = test[[coin_y]] / lag(test[[coin_y]], 1) - 1, 
           coin_x_return = test[[coin_x]] / lag(test[[coin_x]], 1) - 1, 
           coin_y_position = signal * 1           *  1, 
           coin_x_position = signal * hedge_ratio * -1,  
           coin_y_pnl = lag(coin_y_position, 1) * coin_y_return, 
           coin_x_pnl = lag(coin_x_position, 1) * coin_x_return, 
           combined_position = abs(coin_y_position) + abs(coin_x_position), 
           combined_pnl = coin_y_pnl + coin_x_pnl, 
           combined_return = combined_pnl / (1 + hedge_ratio)) %>% 
    mutate_all(funs(ifelse(is.na(.), 0, .))) %>% 
    mutate(return_pair = cumprod(1 + combined_return)) 
  return(df_backtest[["return_pair"]])
} 

10. Backtest Strategy Function

Description
Calculate the return of a cointegration-based mean reversion trading strategy using an equally weighted portfolio of cointegrated coin pairs.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
selected_pairs: A dataframe generated by select_coins() that represents a set of cointegrated coin pairs.

backtest_strategy <- function(train, test, selected_pairs, threshold_z) { 
  if (nrow(selected_pairs) == 0) { 
    return(1) 
  } 
  df <- tibble()  
  for (i in 1:nrow(selected_pairs)) { 
    single_pair <- tibble(
      return_pair = backtest_pair(train = train, 
                                  test = test, 
                                  coin_y = selected_pairs[["coin_y"]][i], 
                                  coin_x = selected_pairs[["coin_x"]][i], 
                                  threshold_z = threshold_z), 
      coin_y = selected_pairs[["coin_y"]][i], 
      coin_x = selected_pairs[["coin_x"]][i], 
      date_time = test[["date_time"]]
    )
    df <- bind_rows(df, single_pair)
  }
  df <- df %>% 
    group_by(date_time) %>% 
    summarise(return_strategy = mean(return_pair)) 
  return(df[["return_strategy"]])
} 

11. Plot Single Function

Description
Create plots of a cointegration-based mean reversion trading strategy of a single coin pair conprised of coin y and coin x. There are two plots created by this function. The first plot displays the spread transformed into z-score with three red lines at -2, 0, and 2. A green line indicates the signal which can take values -1, 0, and +1. The second plot displays the cumulative return of the model in blue. Two additional lines show the buy and hold return of coin y and coin x as red and green lines, respectively.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

plot_single <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))   
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_plot <- test %>% 
    mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept, 
           spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]), 
           signal = generate_signals(train = train, 
                                     test = test, 
                                     coin_y = coin_y, 
                                     coin_x = coin_x, 
                                     threshold_z = threshold_z), 
           return_pair = backtest_pair(train = train, 
                                       test = test, 
                                       coin_y = coin_y, 
                                       coin_x = coin_x, 
                                       threshold_z = threshold_z), 
           return_buyhold_y = test[[coin_y]] / test[[coin_y]][1], 
           return_buyhold_x = test[[coin_x]] / test[[coin_x]][1])
  print(summary(model)) 
  print(ggplot(df_plot, aes(x = date_time)) + 
          geom_line(aes(y = spread_z, colour = "Spread Z"), size = 1) + 
          geom_line(aes(y = signal, colour = "Signal"), size = 0.5) + 
          geom_hline(yintercept = 0, colour = "red", alpha = 0.5) + 
          geom_hline(yintercept = 2, colour = "red", alpha = 0.5) + 
          geom_hline(yintercept = -2, colour = "red", alpha = 0.5) + 
          scale_color_manual(name = "Series", 
                             values = c("Spread Z" = "blue", 
                                        "Signal" = "green")) + 
          labs(title = "Spread vs Trading Signal", 
               subtitle = str_c(coin_y, " and ", coin_x), 
               x = "Date", 
               y = "Spread and Signal")) 
  print(ggplot(df_plot, aes(x = date_time)) + 
          geom_line(aes(y = return_pair, colour = "Model"), size = 1) + 
          geom_line(aes(y = return_buyhold_y, colour = "Coin Y"), size = 0.5, alpha = 0.4) + 
          geom_line(aes(y = return_buyhold_x, colour = "Coin X"), size = 0.5, alpha = 0.4) + 
          geom_hline(yintercept = 1, colour = "black") + 
          scale_color_manual(name = "Return", 
                             values = c("Model" = "darkblue", 
                                        "Coin Y" = "darkred", 
                                        "Coin X" = "darkgreen")) + 
          labs(title = "Model Return vs Buy Hold Return", 
               subtitle = str_c(coin_y, " and ", coin_x), 
               x = "Date", 
               y = "Cumulative Return"))
} 

12. Plot Many Function

Description
Create many plots by calling the plot_single() function multiple times. Also creates a plot showing the results of the overall strategy. Creates a train and test set surrounding a cutoff date and creates plot for the top 10 selected coins ranked by their ADF statistic.

Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
cutoff_date: A data representing the cutoff date between the train and test sets.
train_window: A period object from the lubridate package representing the length of time the train set covers.
test_window: A period object from lubridate package representing the length of time the the test set covers. threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

plot_many <- function(pricing_data, time_resolution, cutoff_date, train_window, test_window, threshold_z) { 
  train <- prepare_data(pricing_data = pricing_data, 
                        time_resolution = time_resolution, 
                        start_date = as.Date(cutoff_date) - train_window, 
                        end_date = as.Date(cutoff_date)) 
  test <- prepare_data(pricing_data = pricing_data, 
                       time_resolution = time_resolution, 
                       start_date = as.Date(cutoff_date), 
                       end_date = as.Date(cutoff_date) + test_window) 
  selected_pairs <- select_pairs(train = train, 
                                 coin_pairs = create_pairs(quote_currency = quote_currency))
  if (nrow(selected_pairs) == 0) { 
    return("No coin pairs selected.")
  } 
  print(selected_pairs) 
  for (i in 1:min(10, nrow(selected_pairs))) { 
    plot_single(train = train, 
                test = test, 
                coin_y = selected_pairs[["coin_y"]][i], 
                coin_x = selected_pairs[["coin_x"]][i], 
                threshold_z = threshold_z)
  } 
  test <- test %>% 
    mutate(return_strategy = backtest_strategy(train = train, 
                                               test = ., 
                                               selected_pairs = selected_pairs, 
                                               threshold_z = threshold_z)) 
  ggplot(test, aes(x = date_time)) + 
    geom_line(aes(y = return_strategy, colour = "Strategy"), size = 1) + 
    geom_line(aes(y = USDT_BTC / USDT_BTC[1], colour = "USDT_BTC"), size = 0.5, alpha = 0.4) + 
    geom_hline(yintercept = 1, colour = "black") + 
    scale_color_manual(name = "Return", 
                       values = c("Strategy" = "darkblue", 
                                  "USDT_BTC" = "darkred")) + 
    labs(title = "Strategy Return vs Buy Hold Return", 
         x = "Date", 
         y = "Cumulative Return") 
} 

13. Set Parameters

quote_currency <- "BTC" 
time_resolution <- 900
train_window <- days(32) 
test_window <- days(16) 
test_by <- "16 days"
threshold_z <- 2 

14. Cross Validation September 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-09-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 11 x 3
##     coin_y  coin_x  adf_stat
##      <chr>   <chr>     <dbl>
##  1 BTC_REP BTC_ZEC -4.963602
##  2 BTC_ZEC BTC_REP -4.867776
##  3 BTC_ETH BTC_ZEC -4.129661
##  4 BTC_XEM BTC_ZEC -3.828771
##  5 BTC_ZEC BTC_ETH -3.803343
##  6 BTC_ETH BTC_XEM -3.770694
##  7 BTC_ETH BTC_REP -3.689461
##  8 BTC_XEM BTC_ETH -3.640334
##  9 BTC_ZEC BTC_XEM -3.612975
## 10 BTC_ZEC BTC_LTC -3.504784
## 11 BTC_REP BTC_ETH -3.465348
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17650 -0.03399 -0.00319  0.02588  0.25218 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.724288   0.023547  -115.7 <0.0000000000000002 ***
## log(train[[coin_x]])  0.866522   0.008338   103.9 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05422 on 3071 degrees of freedom
## Multiple R-squared:  0.7786, Adjusted R-squared:  0.7785 
## F-statistic: 1.08e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.230094 -0.035929  0.004614  0.040786  0.160106 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.823132   0.044705   40.78 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.898522   0.008646  103.92 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05521 on 3071 degrees of freedom
## Multiple R-squared:  0.7786, Adjusted R-squared:  0.7785 
## F-statistic: 1.08e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.143948 -0.028154  0.003173  0.033042  0.080990 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.178540   0.018659  -63.16 <0.0000000000000002 ***
## log(train[[coin_x]])  0.487972   0.006607   73.85 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04296 on 3071 degrees of freedom
## Multiple R-squared:  0.6398, Adjusted R-squared:  0.6397 
## F-statistic:  5454 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.209404 -0.059042  0.006272  0.059956  0.243530 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -7.28043    0.03761 -193.60 <0.0000000000000002 ***
## log(train[[coin_x]])  0.82288    0.01332   61.79 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08659 on 3071 degrees of freedom
## Multiple R-squared:  0.5542, Adjusted R-squared:  0.5541 
## F-statistic:  3818 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.11038 -0.06027 -0.01465  0.05830  0.19164 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.52881    0.04538   11.65 <0.0000000000000002 ***
## log(train[[coin_x]])  1.31109    0.01775   73.85 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07042 on 3071 degrees of freedom
## Multiple R-squared:  0.6398, Adjusted R-squared:  0.6397 
## F-statistic:  5454 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.163273 -0.040941  0.009563  0.034808  0.135724 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.34669    0.06472   20.81 <0.0000000000000002 ***
## log(train[[coin_x]])  0.40637    0.00674   60.29 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04844 on 3071 degrees of freedom
## Multiple R-squared:  0.5421, Adjusted R-squared:  0.5419 
## F-statistic:  3635 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.159000 -0.032542 -0.002298  0.041504  0.138749 
## 
## Coefficients:
##                       Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          -0.328620   0.041764  -7.868  0.00000000000000493 ***
## log(train[[coin_x]])  0.430770   0.008078  53.330 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05158 on 3071 degrees of freedom
## Multiple R-squared:  0.4808, Adjusted R-squared:  0.4806 
## F-statistic:  2844 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.205722 -0.070606 -0.007659  0.065672  0.251793 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -6.19349    0.05656 -109.51 <0.0000000000000002 ***
## log(train[[coin_x]])  1.33394    0.02212   60.29 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08776 on 3071 degrees of freedom
## Multiple R-squared:  0.5421, Adjusted R-squared:  0.5419 
## F-statistic:  3635 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.190054 -0.064251 -0.005459  0.053331  0.207766 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)            3.6459     0.1047   34.83 <0.0000000000000002 ***
## log(train[[coin_x]])   0.6735     0.0109   61.79 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07834 on 3071 degrees of freedom
## Multiple R-squared:  0.5542, Adjusted R-squared:  0.5541 
## F-statistic:  3818 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.164558 -0.041649 -0.007901  0.034900  0.203965 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.406535   0.035781   11.36 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.740433   0.008203   90.26 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06139 on 3071 degrees of freedom
## Multiple R-squared:  0.7262, Adjusted R-squared:  0.7261 
## F-statistic:  8147 on 1 and 3071 DF,  p-value: < 0.00000000000000022

15. Cross Validation August 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-08-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 9 x 3
##     coin_y   coin_x  adf_stat
##      <chr>    <chr>     <dbl>
## 1  BTC_ETH  BTC_ZEC -5.602468
## 2  BTC_ZEC  BTC_ETH -5.486184
## 3  BTC_ETH  BTC_REP -4.189460
## 4  BTC_ZEC  BTC_REP -3.981877
## 5  BTC_REP  BTC_ETH -3.942809
## 6  BTC_REP  BTC_ZEC -3.883980
## 7 BTC_DASH  BTC_XMR -3.677335
## 8  BTC_XEM  BTC_XMR -3.643599
## 9  BTC_XMR BTC_DASH -3.472615
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.098475 -0.016144 -0.001821  0.013377  0.129858 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.682007   0.006483  -105.2 <0.0000000000000002 ***
## log(train[[coin_x]])  0.708955   0.002631   269.4 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02793 on 3071 degrees of freedom
## Multiple R-squared:  0.9594, Adjusted R-squared:  0.9594 
## F-statistic: 7.26e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.173140 -0.019467  0.000125  0.024995  0.129435 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.823251   0.012193   67.52 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.353284   0.005022  269.45 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03859 on 3071 degrees of freedom
## Multiple R-squared:  0.9594, Adjusted R-squared:  0.9594 
## F-statistic: 7.26e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.141739 -0.028546 -0.003488  0.027501  0.163335 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.030387   0.024938   41.32 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.721545   0.005206  138.60 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05148 on 3071 degrees of freedom
## Multiple R-squared:  0.8622, Adjusted R-squared:  0.8621 
## F-statistic: 1.921e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.233523 -0.025842 -0.000285  0.043309  0.235245 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          2.376439   0.031563   75.29 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.009625   0.006589  153.24 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06515 on 3071 degrees of freedom
## Multiple R-squared:  0.8843, Adjusted R-squared:  0.8843 
## F-statistic: 2.348e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.205892 -0.038271 -0.005142  0.044267  0.174785 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.890978   0.020929  -90.35 <0.0000000000000002 ***
## log(train[[coin_x]])  1.194903   0.008621  138.60 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06625 on 3071 degrees of freedom
## Multiple R-squared:  0.8622, Adjusted R-squared:  0.8621 
## F-statistic: 1.921e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.227419 -0.039564 -0.008293  0.029624  0.222612 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.635217   0.014085  -187.1 <0.0000000000000002 ***
## log(train[[coin_x]])  0.875909   0.005716   153.2 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06069 on 3071 degrees of freedom
## Multiple R-squared:  0.8843, Adjusted R-squared:  0.8843 
## F-statistic: 2.348e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.122832 -0.042883  0.005057  0.030852  0.151922 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.01629    0.05052   0.322               0.747    
## log(train[[coin_x]])  0.64442    0.01230  52.410 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04625 on 3071 degrees of freedom
## Multiple R-squared:  0.4721, Adjusted R-squared:  0.472 
## F-statistic:  2747 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.237287 -0.044656  0.003621  0.050134  0.193754 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -6.83351    0.07513  -90.96 <0.0000000000000002 ***
## log(train[[coin_x]])  0.70534    0.01829   38.58 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06878 on 3071 degrees of freedom
## Multiple R-squared:  0.3264, Adjusted R-squared:  0.3262 
## F-statistic:  1488 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.138292 -0.029814 -0.005207  0.040320  0.110065 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.18042    0.03679  -59.27 <0.0000000000000002 ***
## log(train[[coin_x]])  0.73266    0.01398   52.41 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04931 on 3071 degrees of freedom
## Multiple R-squared:  0.4721, Adjusted R-squared:  0.472 
## F-statistic:  2747 on 1 and 3071 DF,  p-value: < 0.00000000000000022

16. Cross Validation July 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-07-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 3 x 3
##    coin_y  coin_x  adf_stat
##     <chr>   <chr>     <dbl>
## 1 BTC_XMR BTC_REP -4.731286
## 2 BTC_REP BTC_XMR -4.559391
## 3 BTC_XMR BTC_ETH -3.477126
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.105170 -0.020197 -0.000205  0.024449  0.147395 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.28443    0.03367  -67.84 <0.0000000000000002 ***
## log(train[[coin_x]])  0.37806    0.00751   50.34 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03532 on 3071 degrees of freedom
## Multiple R-squared:  0.4522, Adjusted R-squared:  0.452 
## F-statistic:  2535 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.204531 -0.028456  0.006383  0.036683  0.169144 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           0.27585    0.09454   2.918              0.00355 ** 
## log(train[[coin_x]])  1.19597    0.02376  50.345 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06281 on 3071 degrees of freedom
## Multiple R-squared:  0.4522, Adjusted R-squared:  0.452 
## F-statistic:  2535 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.097654 -0.023953 -0.003966  0.023561  0.188412 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -3.725166   0.010454  -356.3 <0.0000000000000002 ***
## log(train[[coin_x]])  0.116588   0.004779    24.4 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04367 on 3071 degrees of freedom
## Multiple R-squared:  0.1623, Adjusted R-squared:  0.1621 
## F-statistic: 595.1 on 1 and 3071 DF,  p-value: < 0.00000000000000022

17. Cross Validation June 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-06-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 9 x 3
##     coin_y   coin_x  adf_stat
##      <chr>    <chr>     <dbl>
## 1  BTC_REP BTC_DASH -5.585866
## 2 BTC_DASH  BTC_REP -5.575748
## 3  BTC_XMR  BTC_ZEC -4.142801
## 4  BTC_XMR  BTC_REP -3.959087
## 5  BTC_REP  BTC_XMR -3.803859
## 6  BTC_XMR BTC_DASH -3.718922
## 7 BTC_DASH  BTC_XMR -3.583567
## 8  BTC_XMR  BTC_ETH -3.580314
## 9  BTC_ZEC  BTC_ETH -3.540196
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.303124 -0.046684  0.002596  0.039329  0.279482 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.86207    0.02944  -63.26 <0.0000000000000002 ***
## log(train[[coin_x]])  0.94355    0.01012   93.21 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06565 on 3071 degrees of freedom
## Multiple R-squared:  0.7389, Adjusted R-squared:  0.7388 
## F-statistic:  8689 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.251513 -0.044657 -0.003911  0.043419  0.280357 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.699322   0.038690   18.07 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.783065   0.008401   93.21 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05981 on 3071 degrees of freedom
## Multiple R-squared:  0.7389, Adjusted R-squared:  0.7388 
## F-statistic:  8689 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.185801 -0.044324 -0.007035  0.027913  0.263842 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -3.423057   0.012484 -274.20 <0.0000000000000002 ***
## log(train[[coin_x]])  0.238733   0.004615   51.73 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06513 on 3071 degrees of freedom
## Multiple R-squared:  0.4657, Adjusted R-squared:  0.4655 
## F-statistic:  2676 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.170510 -0.045684 -0.005116  0.038648  0.269216 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.19105    0.04666  -46.96 <0.0000000000000002 ***
## log(train[[coin_x]])  0.40727    0.01013   40.20 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07212 on 3071 degrees of freedom
## Multiple R-squared:  0.3448, Adjusted R-squared:  0.3446 
## F-statistic:  1616 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21261 -0.07797 -0.01731  0.05047  0.32412 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.16127    0.08565  -13.56 <0.0000000000000002 ***
## log(train[[coin_x]])  0.84665    0.02106   40.20 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.104 on 3071 degrees of freedom
## Multiple R-squared:  0.3448, Adjusted R-squared:  0.3446 
## F-statistic:  1616 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14842 -0.04589 -0.00320  0.03037  0.34510 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.81142    0.03291  -85.44 <0.0000000000000002 ***
## log(train[[coin_x]])  0.43177    0.01132   38.16 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07338 on 3071 degrees of freedom
## Multiple R-squared:  0.3216, Adjusted R-squared:  0.3214 
## F-statistic:  1456 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.22882 -0.07094 -0.02948  0.07389  0.27235 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.12311    0.07940   1.551               0.121    
## log(train[[coin_x]])  0.74491    0.01952  38.158 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09639 on 3071 degrees of freedom
## Multiple R-squared:  0.3216, Adjusted R-squared:  0.3214 
## F-statistic:  1456 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17106 -0.04795 -0.01283  0.02951  0.32739 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -3.388707   0.018289 -185.28 <0.0000000000000002 ***
## log(train[[coin_x]])  0.242131   0.006521   37.13 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07402 on 3071 degrees of freedom
## Multiple R-squared:  0.3099, Adjusted R-squared:  0.3096 
## F-statistic:  1379 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49026 -0.05646  0.01091  0.06882  0.52385 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.28478    0.03251    8.76 <0.0000000000000002 ***
## log(train[[coin_x]])  1.06460    0.01159   91.85 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1316 on 3071 degrees of freedom
## Multiple R-squared:  0.7331, Adjusted R-squared:  0.7331 
## F-statistic:  8437 on 1 and 3071 DF,  p-value: < 0.00000000000000022

18. Cross Validation May 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-05-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 18 x 3
##      coin_y   coin_x  adf_stat
##       <chr>    <chr>     <dbl>
##  1  BTC_REP  BTC_ETH -5.006093
##  2  BTC_LTC  BTC_XEM -4.747154
##  3 BTC_DASH  BTC_ETH -4.634350
##  4 BTC_DASH  BTC_REP -4.496507
##  5 BTC_DASH  BTC_ZEC -4.460635
##  6  BTC_ETH  BTC_REP -4.405342
##  7  BTC_LTC  BTC_ZEC -4.340114
##  8  BTC_REP  BTC_ZEC -4.181057
##  9  BTC_ZEC  BTC_ETH -3.922604
## 10  BTC_LTC  BTC_XMR -3.921014
## 11  BTC_LTC  BTC_REP -3.901344
## 12  BTC_LTC  BTC_ETH -3.855233
## 13  BTC_ZEC  BTC_REP -3.815340
## 14  BTC_XMR  BTC_LTC -3.674490
## 15  BTC_ETH  BTC_ZEC -3.608956
## 16  BTC_REP BTC_DASH -3.562364
## 17 BTC_DASH  BTC_XEM -3.479815
## 18  BTC_LTC BTC_DASH -3.439666
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.243207 -0.043248 -0.004972  0.042142  0.283352 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.26795    0.03746  -60.55 <0.0000000000000002 ***
## log(train[[coin_x]])  0.74116    0.01174   63.15 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07188 on 3071 degrees of freedom
## Multiple R-squared:  0.5649, Adjusted R-squared:  0.5648 
## F-statistic:  3987 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64070 -0.07878 -0.01065  0.10224  0.33519 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          0.295022   0.083067   3.552             0.000389 ***
## log(train[[coin_x]]) 0.462342   0.007707  59.992 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1346 on 3071 degrees of freedom
## Multiple R-squared:  0.5396, Adjusted R-squared:  0.5394 
## F-statistic:  3599 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40788 -0.02868  0.00549  0.03510  0.18551 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.71347    0.03286  -21.71 <0.0000000000000002 ***
## log(train[[coin_x]])  0.66051    0.01030   64.15 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06305 on 3071 degrees of freedom
## Multiple R-squared:  0.5726, Adjusted R-squared:  0.5725 
## F-statistic:  4115 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43255 -0.04479 -0.00292  0.03431  0.34540 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.90420    0.06542  -13.82 <0.0000000000000002 ***
## log(train[[coin_x]])  0.41362    0.01412   29.29 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08527 on 3071 degrees of freedom
## Multiple R-squared:  0.2184, Adjusted R-squared:  0.2181 
## F-statistic: 857.9 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37359 -0.04167 -0.01106  0.03466  0.26104 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.72366    0.04622  -15.66 <0.0000000000000002 ***
## log(train[[coin_x]])  0.72872    0.01606   45.37 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07463 on 3071 degrees of freedom
## Multiple R-squared:  0.4013, Adjusted R-squared:  0.4011 
## F-statistic:  2058 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.23111 -0.03741 -0.01738  0.02099  0.27348 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           0.34105    0.05592   6.099         0.0000000012 ***
## log(train[[coin_x]])  0.76220    0.01207  63.145 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07289 on 3071 degrees of freedom
## Multiple R-squared:  0.5649, Adjusted R-squared:  0.5648 
## F-statistic:  3987 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.85383 -0.10088  0.02582  0.12960  0.34575 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -3.24885    0.12013  -27.05 <0.0000000000000002 ***
## log(train[[coin_x]])  0.49966    0.04174   11.97 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1939 on 3071 degrees of freedom
## Multiple R-squared:  0.04458,    Adjusted R-squared:  0.04427 
## F-statistic: 143.3 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.256927 -0.056269  0.000649  0.043824  0.284026 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.10764    0.04979  -42.33 <0.0000000000000002 ***
## log(train[[coin_x]])  0.87740    0.01730   50.72 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08038 on 3071 degrees of freedom
## Multiple R-squared:  0.4558, Adjusted R-squared:  0.4556 
## F-statistic:  2572 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.136863 -0.039752 -0.003195  0.034642  0.235201 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.971391   0.026937  -36.06 <0.0000000000000002 ***
## log(train[[coin_x]])  0.597421   0.008441   70.77 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05169 on 3071 degrees of freedom
## Multiple R-squared:  0.6199, Adjusted R-squared:  0.6198 
## F-statistic:  5009 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.47762 -0.09886 -0.00977  0.10808  0.43601 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -13.9785     0.1758  -79.49 <0.0000000000000002 ***
## log(train[[coin_x]])  -2.2882     0.0433  -52.85 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1436 on 3071 degrees of freedom
## Multiple R-squared:  0.4763, Adjusted R-squared:  0.4761 
## F-statistic:  2793 on 1 and 3071 DF,  p-value: < 0.00000000000000022

19. Cross Validation April 2017

plot_many(pricing_data = pricing_data,
          time_resolution = time_resolution,
          cutoff_date = "2017-04-01",
          train_window = train_window,
          test_window = test_window,
          threshold_z = threshold_z)
## # A tibble: 8 x 3
##     coin_y   coin_x  adf_stat
##      <chr>    <chr>     <dbl>
## 1  BTC_XEM  BTC_ZEC -4.121186
## 2  BTC_ZEC  BTC_XEM -3.946709
## 3  BTC_REP  BTC_ETH -3.923221
## 4  BTC_ETH  BTC_REP -3.919118
## 5 BTC_DASH  BTC_XMR -3.846020
## 6  BTC_XMR  BTC_ZEC -3.725802
## 7  BTC_XMR BTC_DASH -3.692261
## 8  BTC_ZEC  BTC_XMR -3.663936
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42059 -0.03847  0.01868  0.07308  0.25551 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -8.955500   0.018855  -475.0 <0.0000000000000002 ***
## log(train[[coin_x]])  0.814683   0.005997   135.9 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1103 on 3071 degrees of freedom
## Multiple R-squared:  0.8573, Adjusted R-squared:  0.8573 
## F-statistic: 1.846e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31897 -0.07513 -0.01755  0.06639  0.52712 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          8.978369   0.089133   100.7 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.052359   0.007746   135.9 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1253 on 3071 degrees of freedom
## Multiple R-squared:  0.8573, Adjusted R-squared:  0.8573 
## F-statistic: 1.846e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20181 -0.04849 -0.01266  0.03608  0.37271 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.953760   0.009316  -317.1 <0.0000000000000002 ***
## log(train[[coin_x]])  0.577941   0.002548   226.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07335 on 3071 degrees of freedom
## Multiple R-squared:  0.9437, Adjusted R-squared:  0.9437 
## F-statistic: 5.145e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57861 -0.06992  0.01310  0.09656  0.35437 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          4.619085   0.036388   126.9 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.632817   0.007199   226.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1233 on 3071 degrees of freedom
## Multiple R-squared:  0.9437, Adjusted R-squared:  0.9437 
## F-statistic: 5.145e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33841 -0.08243 -0.00249  0.07335  0.37893 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          3.241328   0.035723   90.73 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.443311   0.008526  169.29 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1365 on 3071 degrees of freedom
## Multiple R-squared:  0.9032, Adjusted R-squared:  0.9032 
## F-statistic: 2.866e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24816 -0.04974 -0.01505  0.03430  0.40928 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.637464   0.017595  -93.06 <0.0000000000000002 ***
## log(train[[coin_x]])  0.813237   0.005596  145.31 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1029 on 3071 degrees of freedom
## Multiple R-squared:  0.873,  Adjusted R-squared:  0.873 
## F-statistic: 2.112e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27888 -0.05304  0.01216  0.05960  0.24929 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.432975   0.010447  -232.9 <0.0000000000000002 ***
## log(train[[coin_x]])  0.625792   0.003697   169.3 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08986 on 3071 degrees of freedom
## Multiple R-squared:  0.9032, Adjusted R-squared:  0.9032 
## F-statistic: 2.866e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43774 -0.05981  0.03444  0.07622  0.33734 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.360900   0.030954   43.97 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.073528   0.007388  145.31 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1183 on 3071 degrees of freedom
## Multiple R-squared:  0.873,  Adjusted R-squared:  0.873 
## F-statistic: 2.112e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

20. Cross Validation March 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-03-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 2 x 3
##    coin_y  coin_x  adf_stat
##     <chr>   <chr>     <dbl>
## 1 BTC_XMR BTC_LTC -4.163767
## 2 BTC_LTC BTC_XMR -4.030214
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.096961 -0.023558 -0.006907  0.013749  0.107417 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.741186   0.045673   38.12 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.101011   0.008171  134.75 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03604 on 3071 degrees of freedom
## Multiple R-squared:  0.8553, Adjusted R-squared:  0.8553 
## F-statistic: 1.816e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.082026 -0.012154  0.006332  0.019831  0.095291 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.161233   0.025446  -84.94 <0.0000000000000002 ***
## log(train[[coin_x]])  0.776862   0.005765  134.75 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03027 on 3071 degrees of freedom
## Multiple R-squared:  0.8553, Adjusted R-squared:  0.8553 
## F-statistic: 1.816e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

21. Cross Validation February 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-02-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 3 x 3
##    coin_y  coin_x  adf_stat
##     <chr>   <chr>     <dbl>
## 1 BTC_REP BTC_ETH -4.550570
## 2 BTC_ETH BTC_REP -3.841857
## 3 BTC_REP BTC_ZEC -3.523153
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.102880 -0.034125 -0.004815  0.026554  0.179239 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.644839   0.032683  -80.92 <0.0000000000000002 ***
## log(train[[coin_x]])  0.600623   0.007245   82.90 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04532 on 3071 degrees of freedom
## Multiple R-squared:  0.6911, Adjusted R-squared:  0.691 
## F-statistic:  6872 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.198335 -0.037258  0.008305  0.046103  0.132195 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.65058    0.07432   22.21 <0.0000000000000002 ***
## log(train[[coin_x]])  1.15069    0.01388   82.90 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06274 on 3071 degrees of freedom
## Multiple R-squared:  0.6911, Adjusted R-squared:  0.691 
## F-statistic:  6872 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25900 -0.01622  0.02158  0.05104  0.17160 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -4.46839    0.05429  -82.31 <0.0000000000000002 ***
## log(train[[coin_x]])  0.29221    0.01792   16.31 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07824 on 3071 degrees of freedom
## Multiple R-squared:  0.07968,    Adjusted R-squared:  0.07938 
## F-statistic: 265.9 on 1 and 3071 DF,  p-value: < 0.00000000000000022

22. Cross Validation January 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-01-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 5 x 3
##     coin_y   coin_x  adf_stat
##      <chr>    <chr>     <dbl>
## 1  BTC_XEM  BTC_ETH -3.913761
## 2 BTC_DASH  BTC_XEM -3.861305
## 3  BTC_REP  BTC_ETH -3.843512
## 4  BTC_XEM BTC_DASH -3.827549
## 5  BTC_REP  BTC_ZEC -3.485578
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.130263 -0.042095 -0.000095  0.036790  0.154270 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -9.737010   0.038216 -254.79 <0.0000000000000002 ***
## log(train[[coin_x]])  0.564951   0.008219   68.73 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05436 on 3071 degrees of freedom
## Multiple R-squared:  0.606,  Adjusted R-squared:  0.6059 
## F-statistic:  4724 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.088829 -0.033901 -0.003252  0.027641  0.098851 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.930214   0.098801   9.415 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.435756   0.007992  54.527 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03835 on 3071 degrees of freedom
## Multiple R-squared:  0.4919, Adjusted R-squared:  0.4917 
## F-statistic:  2973 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.228945 -0.059397 -0.008747  0.064865  0.226972 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.85655    0.05487  -33.84 <0.0000000000000002 ***
## log(train[[coin_x]])  0.79705    0.01180   67.54 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07804 on 3071 degrees of freedom
## Multiple R-squared:  0.5977, Adjusted R-squared:  0.5975 
## F-statistic:  4562 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.176723 -0.037859 -0.006745  0.041624  0.155113 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -7.33155    0.09228  -79.45 <0.0000000000000002 ***
## log(train[[coin_x]])  1.12887    0.02070   54.53 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06173 on 3071 degrees of freedom
## Multiple R-squared:  0.4919, Adjusted R-squared:  0.4917 
## F-statistic:  2973 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38003 -0.01884  0.02013  0.05096  0.18371 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -4.384261   0.022303 -196.58 <0.0000000000000002 ***
## log(train[[coin_x]])  0.414442   0.007833   52.91 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08899 on 3071 degrees of freedom
## Multiple R-squared:  0.4769, Adjusted R-squared:  0.4767 
## F-statistic:  2799 on 1 and 3071 DF,  p-value: < 0.00000000000000022

23. Cross Validation Full

cutoff_dates <- seq(ymd("2017-01-01"), ymd("2017-10-01"), by = test_by)
results <- tibble() 
for (cutoff_date in cutoff_dates) { 
  cutoff_date <- as.Date(cutoff_date) 
  print(str_c("Cross validating strategy."))
  print(str_c("Using train set from ", cutoff_date - train_window , " to ", cutoff_date, ".")) 
  print(str_c("Using test set from ", cutoff_date, " to ", cutoff_date + test_window, "."))  
  train <- prepare_data(pricing_data = pricing_data, 
                        time_resolution = time_resolution, 
                        start_date = cutoff_date - train_window, 
                        end_date = cutoff_date) 
  test <- prepare_data(pricing_data = pricing_data, 
                       time_resolution = time_resolution, 
                       start_date = cutoff_date, 
                       end_date = cutoff_date + test_window) 
  test <- test %>% 
    mutate(return_strategy = 
             backtest_strategy(train = train, 
                               test = test, 
                               selected_pairs = select_pairs(train = train, coin_pairs = create_pairs(quote_currency = quote_currency)), 
                               threshold_z = threshold_z), 
           return_strategy_change = return_strategy / lag(return_strategy, 1) - 1) %>% 
    mutate_all(funs(ifelse(is.na(.), 0, .)))
  results <- bind_rows(results, test) 
} 
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-11-30 to 2017-01-01."
## [1] "Using test set from 2017-01-01 to 2017-01-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-12-16 to 2017-01-17."
## [1] "Using test set from 2017-01-17 to 2017-02-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-01 to 2017-02-02."
## [1] "Using test set from 2017-02-02 to 2017-02-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-17 to 2017-02-18."
## [1] "Using test set from 2017-02-18 to 2017-03-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-02 to 2017-03-06."
## [1] "Using test set from 2017-03-06 to 2017-03-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-18 to 2017-03-22."
## [1] "Using test set from 2017-03-22 to 2017-04-07."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-06 to 2017-04-07."
## [1] "Using test set from 2017-04-07 to 2017-04-23."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-22 to 2017-04-23."
## [1] "Using test set from 2017-04-23 to 2017-05-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-07 to 2017-05-09."
## [1] "Using test set from 2017-05-09 to 2017-05-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-23 to 2017-05-25."
## [1] "Using test set from 2017-05-25 to 2017-06-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-09 to 2017-06-10."
## [1] "Using test set from 2017-06-10 to 2017-06-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-25 to 2017-06-26."
## [1] "Using test set from 2017-06-26 to 2017-07-12."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-10 to 2017-07-12."
## [1] "Using test set from 2017-07-12 to 2017-07-28."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-26 to 2017-07-28."
## [1] "Using test set from 2017-07-28 to 2017-08-13."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-12 to 2017-08-13."
## [1] "Using test set from 2017-08-13 to 2017-08-29."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-28 to 2017-08-29."
## [1] "Using test set from 2017-08-29 to 2017-09-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-13 to 2017-09-14."
## [1] "Using test set from 2017-09-14 to 2017-09-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-29 to 2017-09-30."
## [1] "Using test set from 2017-09-30 to 2017-10-16."
results <- results %>% 
  mutate(return_strategy_cumulative = cumprod(1 + return_strategy_change), 
         date_time = as.POSIXct(date_time, origin = "1970-01-01")) 
ggplot(results, aes(x = date_time)) + 
  geom_line(aes(y = return_strategy_cumulative), colour = "blue", size = 1) + 
  geom_hline(yintercept = 1, colour = "black") + 
  labs(title = "Strategy Return vs Buy Hold Return", x = "Date", y = "Cumulative Return") 

print(results[["return_strategy_cumulative"]][nrow(results)]) 
## [1] 1.176044